Graph Retrieval-Augmented Generation: A Survey
https://gyazo.com/5ab30b83b00a9bd520b88793b9dbe9ce
GPT5.icon
A comprehensive review that organizes a system of techniques to compensate for the weaknesses of RAG (ignoring relationships, redundant context, lack of the big picture) by utilizing graph structure (KG/TAG) to achieve more accurate and contextual understanding of the generation.
What's new.
GraphRAG formulation:
Utilizes relational knowledge and global structures that are difficult to pick up in string RAGs, and is advantageous in QFS, etc.
G-Indexing (graph infrastructure)
Data source: open KG (Wikidata/ConceptNet, etc.) / own construction (TAG from documents, tables, logs).
Index:
Graph Indexing (Structure Search)
Text index (Template conversion → BM25/dense search)
Vector index (node/egonet embedding)
Hybrid recommended.
G-Retrieval
Retrievers: non-parametric (BFS/shortest path/PCST), LM systems, GNN systems, or combined.
Paradigm: batch/repeat (adaptive/non-adaptive)/multi-stage (coarse to fine).
Granularity: node, triple, path, subgraph, mixed.
Enhancement: query expansion/decomposition, knowledge merge/pruning (re-ranking, PageRank, LLM checks, etc.)
G-Generation
Generator:
GNN (strong in discrimination tasks)
LM (strong in generation/reasoning)
Hybrid: cascade (GNN to LM prefix)/parallel (representation coupling and output integration).
LM input format for graphs:
Graph language (edge tables, natural sentences, code-like notation, syntax trees, node sequences)
Graph embedding (fused by Prefix/Prompt Tuning or FiD).
Generation enhancement: pre (rewriting and planning) / during (constraint decoding, etc.) / post (answer integration and verification).
learning
No learning required: rule search + prompts, embedded similarity.
With learning: Supervision/remote supervision/RL/self-supervision to optimize the retriever/generator.
Collaborative learning: mutual reinforcement for alternate learning and integration purposes.
Application and evaluation
Tasks: KBQA/CSQA, EL/RE, fact verification, link prediction, dialogue, recommendation.
Areas: e-commerce, medical, academic, legal, literary, etc.
Bench: WebQSP/CWQ/HotpotQA, etc., STaRK, GraphQA, GRBENCH, CRAG.
Indicators: EM/F1/Accuracy/BLEU types + acquisition quality (coverage, diversity, fidelity).
Industry Case Studies
Microsoft GraphRAG (QFS enhanced with community summary), NebulaGraph Edition, Ant Group Edition, Neo4j NaLLM/Graph Builder, etc.
Future Issues
Dynamic graph updating and adaptation, multimodal integration, and efficiency in large graphs,
Linkage with Graph Foundation Models, lossless compression to mitigate long-text problems, standard bench maintenance, and application extensions.
Practical tips (super summary)
Hybrid indexing + multi-stage acquisition for accuracy/cost optimization even on a small scale.
Select granularity (path/subgraph center) according to the task.
LM input should be in short, complete, readable graph language (community summary if necessary).
Critical phases are stabilized when re-ranking/pruning and post-verification are included.
---